Camera calibration using depth data

ABSTRACT

An apparatus is described herein. The apparatus includes an image capture module to capture depth data and sensor data for a plurality of views and an extraction module to extract a first plurality of features from the depth data and a second plurality of features from the sensor data for each view. The apparatus also includes a correspondence module to locate corresponding features in the first plurality of features and the second plurality of features for each view and a depth module to generate three-dimensional data for each feature of the first plurality of features for each view. Additionally, the apparatus includes a calibration module to calibrate the multiple cameras by matching the generated three dimensional data with the corresponding features in the first plurality of features and the second plurality of features.

BACKGROUND ART

Electronic devices, such as tablets, phablets, smartphones, mobilephones, desktops, laptops, gaming devices, all-in-one systems, and thelike may include various cameras for capturing images. The electronicdevices may require calibration of camera with regard to how the cameraperceives the world. In some cases, the electronic device may include aplurality of cameras configured to work individually or in concert. Eachcamera may be used to capture different image information, such asdepth, color space, or other information regarding a scene.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an exemplary system that enables cameracalibration using depth data;

FIG. 2A is a diagram illustrating a target to be used in calibration;

FIG. 2B is an illustration of a target placed in error for use incalibration;

FIG. 3 is an illustration of a plurality of views of a calibrationtarget;

FIG. 4A is a process flow diagram of a method that enables cameracalibration using depth data;

FIG. 4B is a process flow diagram of another method that enables cameracalibration using depth data;

FIG. 5 is a line graph illustrating error with traditional calibrationand averaged calibration; and

FIG. 6 is a block diagram showing media that contains logic for cameracalibration using depth data.

The same numbers are used throughout the disclosure and the figures toreference like components and features. Numbers in the 100 series referto features originally found in FIG. 1; numbers in the 200 series referto features originally found in FIG. 2; and so on.

DETAILED DESCRIPTION

Typically, each camera of a plurality of cameras in a system iscalibrated prior to use. In some cases, the calibration of multiplecameras require capturing many images of a calibration target atdifferent angels and distances from the cameras of a given system thatincludes multiple cameras. The calibration target may be a pattern withan easily distinguished geometry. For example, the calibration targetmay be a printed checkerboard. In the typical calibration process, thecheckerboard must have a precise and accurate physical geometry. Thereis often high costs associated with generating a precise and accuratecalibration target.

Embodiments described herein enable camera calibration using depth data.In embodiments, the present techniques enable a depth camera to red,green, blue (RGB) camera calibration. Through this calibration, thedepth camera and RGB camera may be set such that the cameras properlycapture images. Calibration may be performed from a depth sensor to anRGB sensor, utilizing available three dimensional depth data instead oflimited two dimensional image data. The actual physical target geometryis computed as part of the process, rather than required as input to thecalibration process.

Some embodiments may be implemented in one or a combination of hardware,firmware, and software. Further, some embodiments may also beimplemented as instructions stored on a machine-readable medium, whichmay be read and executed by a computing platform to perform theoperations described herein. A machine-readable medium may include anymechanism for storing or transmitting information in a form readable bya machine, e.g., a computer. For example, a machine-readable medium mayinclude read only memory (ROM); random access memory (RAM); magneticdisk storage media; optical storage media; flash memory devices; orelectrical, optical, acoustical or other form of propagated signals,e.g., carrier waves, infrared signals, digital signals, or theinterfaces that transmit and/or receive signals, among others.

An embodiment is an implementation or example. Reference in thespecification to “an embodiment,” “one embodiment,” “some embodiments,”“various embodiments,” or “other embodiments” means that a particularfeature, structure, or characteristic described in connection with theembodiments is included in at least some embodiments, but notnecessarily all embodiments, of the present techniques. The variousappearances of “an embodiment,” “one embodiment,” or “some embodiments”are not necessarily all referring to the same embodiments. Elements oraspects from an embodiment can be combined with elements or aspects ofanother embodiment.

Not all components, features, structures, characteristics, etc.described and illustrated herein need be included in a particularembodiment or embodiments. If the specification states a component,feature, structure, or characteristic “may”, “might”, “can” or “could”be included, for example, that particular component, feature, structure,or characteristic is not required to be included. If the specificationor claim refers to “a” or “an” element, that does not mean there is onlyone of the element. If the specification or claims refer to “anadditional” element, that does not preclude there being more than one ofthe additional element.

It is to be noted that, although some embodiments have been described inreference to particular implementations, other implementations arepossible according to some embodiments. Additionally, the arrangementand/or order of circuit elements or other features illustrated in thedrawings and/or described herein need not be arranged in the particularway illustrated and described. Many other arrangements are possibleaccording to some embodiments.

In each system shown in a figure, the elements in some cases may eachhave a same reference number or a different reference number to suggestthat the elements represented could be different and/or similar.However, an element may be flexible enough to have differentimplementations and work with some or all of the systems shown ordescribed herein. The various elements shown in the figures may be thesame or different. Which one is referred to as a first element and whichis called a second element is arbitrary.

FIG. 1 is a block diagram of an exemplary system that enables cameracalibration using depth data. The electronic device 100 may be, forexample, a laptop computer, tablet computer, mobile phone, smart phone,or a wearable device, among others. The electronic device 100 mayinclude a central processing unit (CPU) 102 that is configured toexecute stored instructions, as well as a memory device 104 that storesinstructions that are executable by the CPU 102. The CPU may be coupledto the memory device 104 by a bus 106. Additionally, the CPU 102 can bea single core processor, a multi-core processor, a computing cluster, orany number of other configurations. Furthermore, the electronic device100 may include more than one CPU 102. The memory device 104 can includerandom access memory (RAM), read only memory (ROM), flash memory, or anyother suitable memory systems. For example, the memory device 104 mayinclude dynamic random access memory (DRAM).

The electronic device 100 also includes a graphics processing unit (GPU)108. As shown, the CPU 102 can be coupled through the bus 106 to the GPU108. The GPU 108 can be configured to perform any number of graphicsoperations within the electronic device 100. For example, the GPU 108can be configured to render or manipulate graphics images, graphicsframes, videos, streaming data, or the like, to be rendered or displayedto a user of the electronic device 100. In some embodiments, the GPU 108includes a number of graphics engines, wherein each graphics engine isconfigured to perform specific graphics tasks, or to execute specifictypes of workloads.

The CPU 102 can be linked through the bus 106 to a display interface 110configured to connect the electronic device 100 to one or more displaydevices 112. The display devices 112 can include a display screen thatis a built-in component of the electronic device 100. In embodiments,the display interface 110 is coupled with the display devices 112 viaany networking technology such as cellular hardware 124, Wifi hardware126, or Bluetooth Interface 128 across the network 132. The displaydevices 112 can also include a computer monitor, television, orprojector, among others, that is externally connected to the electronicdevice 100.

The CPU 102 can also be connected through the bus 106 to an input/output(I/O) device interface 114 configured to connect the electronic device100 to one or more I/O devices 116. The I/O devices 116 can include, forexample, a keyboard and a pointing device, wherein the pointing devicecan include a touchpad or a touchscreen, among others. The I/O devices116 can be built-in components of the electronic device 100, or can bedevices that are externally connected to the electronic device 100.Accordingly, in embodiments, the I/O device interface 114 is coupledwith the I/O devices 116 via any networking technology such as cellularhardware 124, Wifi hardware 126, or Bluetooth Interface 128 across thenetwork 132. The I/O devices 116 can also include any I/O device that isexternally connected to the electronic device 100.

The electronic device 100 also includes image capture mechanisms 118.The image capture mechanisms may be a plurality of cameras. The imagecapture mechanisms 118 may also include a plurality of sensors. Inembodiments, the image capture mechanisms 118 may be a depth camera andan RGB camera. Additionally, in embodiments, the image capturemechanisms 118 may be a depth sensor and an RGB sensor. In someembodiments, the image capture mechanisms 118 can be a camera,stereoscopic camera, infrared sensor, and the like. The image capturemechanisms 118 are used to capture image information and thecorresponding depth information. The image capture mechanisms 118 mayinclude sensors such as a depth sensor, RGB sensor, an image sensor, aninfrared sensor, an X-Ray photon counting sensor, a light sensor, or anycombination thereof. The image sensors may include charge-coupled device(CCD) image sensors, complementary metal-oxide-semiconductor (CMOS)image sensors, system on chip (SOC) image sensors, image sensors withphotosensitive thin film transistors, or any combination thereof. Insome embodiments, a sensor is a depth sensor. The depth sensor may beused to capture the depth information associated with the imageinformation. In some embodiments, a driver may be used to operate asensor within the image capture devices 118, such as a depth sensor.

The electronic device 118 may also include a calibration mechanism 120.The calibration mechanism may use depth data to make the calibrationprocess more robust. In embodiments, the calibration mechanism 120 is tocalibrate a plurality of cameras using depth data, thereby removing theburden of having accurate targets and potentially introduce newcalibration scenarios. New calibration scenarios include, but are notlimited to natural objects as targets instead of designated checkerboardtargets. During calibration, the depth camera center is maintained at acenter location relative to the calibration target. For example, thelocation may be described according to Cartesian coordinates, where x,y, and z are each zero. This single frame of reference is used tocapture all views of the calibration target onto a reference location tocreate an average calibration target model. In embodiments, acalibration target model is a function used to transformation betweenworld and image coordinates. Calibration is used to obtain the correctmodel parameters.

In embodiments, camera calibration results in a linear calibrationtarget model that is defined by eleven parameters. The eleven parametersinclude camera location (3 parameters: x, y, z), orientation (3parameters: roll, pitch, and yaw), focal length (1 parameter: distance),pixel scale (1 parameter), pixel aspect ratio (1 parameter), and imageplane center offset (2 parameters). The calibration target model,through these parameters projection for each camera that maps a point inthe camera image from a three-dimensional point in space to atwo-dimensional location in the camera image.

The electronic device 100 also includes a storage device 124. Thestorage device 124 is a physical memory such as a hard drive, an opticaldrive, a flash drive, an array of drives, or any combinations thereof.The storage device 124 can store user data, such as audio files, videofiles, audio/video files, and picture files, among others. The storagedevice 124 can also store programming code such as device drivers,software applications, operating systems, and the like. The programmingcode stored to the storage device 124 may be executed by the CPU 102,GPU 108, or any other processors that may be included in the electronicdevice 100.

The CPU 102 may be linked through the bus 106 to cellular hardware 126.The cellular hardware 126 may be any cellular technology, for example,the 4G standard (International Mobile Telecommunications-Advanced(IMT-Advanced) Standard promulgated by the InternationalTelecommunications Union-Radio communication Sector (ITU-R)). In thismanner, the electronic device 100 may access any network 132 withoutbeing tethered or paired to another device, where the cellular hardware126 enables access to the network 132.

The CPU 102 may also be linked through the bus 106 to WiFi hardware 128.The WiFi hardware 128 is hardware according to WiFi standards (standardspromulgated as Institute of Electrical and Electronics Engineers' (IEEE)802.11 standards). The WiFi hardware 128 enables the electronic device100 to connect to the Internet using the Transmission Control Protocoland the Internet Protocol (TCP/IP). Accordingly, the electronic device100 can enable end-to-end connectivity with the Internet by addressing,routing, transmitting, and receiving data according to the TCP/IPprotocol without the use of another device. Additionally, a BluetoothInterface 130 may be coupled to the CPU 102 through the bus 106. TheBluetooth Interface 130 is an interface according to Bluetooth networks(based on the Bluetooth standard promulgated by the Bluetooth SpecialInterest Group). The Bluetooth Interface 130 enables the electronicdevice 100 to be paired with other Bluetooth enabled devices through apersonal area network (PAN). Accordingly, the network 132 may be a PAN.Examples of Bluetooth enabled devices include a laptop computer, desktopcomputer, ultrabook, tablet computer, mobile device, or server, amongothers.

The block diagram of FIG. 1 is not intended to indicate that theelectronic device 100 is to include all of the components shown inFIG. 1. Rather, the computing system 100 can include fewer or additionalcomponents not illustrated in FIG. 1 (e.g., sensors, power managementintegrated circuits, additional network interfaces, etc.). Theelectronic device 100 may include any number of additional componentsnot shown in FIG. 1, depending on the details of the specificimplementation. Furthermore, any of the functionalities of the CPU 102may be partially, or entirely, implemented in hardware and/or in aprocessor. For example, the functionality may be implemented with anapplication specific integrated circuit, in logic implemented in aprocessor, in logic implemented in a specialized graphics processingunit, or in any other device.

FIG. 2A is a diagram illustrating a target 200A to be used incalibration. As illustrated, the calibration target 200A is acheckerboard pattern. Thus, the target 200A includes a plurality ofsquares, where the plurality of squares include a plurality of blacksquares 202 and a plurality of white squares 204. In examples, for useas a calibration target, each square is 2 centimeters by 2 centimeters.

In existing calibration processes, the physical geometry of the targetsuch as the target 200A must be known in advance and is an input to acalibration algorithm. The physical geometry may include either threedimensional coordinates for a three dimensional target or twodimensional measurements for the more commonly used, two dimensional,planar targets. In both, cases achieving an accurate target model is anexpensive process. For example, printing a checkerboard pattern for atwo dimensional calibration target with squares that are precisely aparticular size can be an expensive process, and requires costlyequipment to generate a precise and accurate target.

Additionally, an accurate and precisely printed target may beincorrectly placed for calibration by an operator. FIG. 2B is anillustration of a target 200B placed in error for use in calibration.For calibration, an operator typically places the calibration target ona platform so that cameras of the electronic device can be calibrated bycapturing the calibration target using the cameras. The calibrationtarget may be printed onto paper that is applied to a platform using anadhesive. The target 200B illustrates deformities that can occur whenplacing the calibration target. For example, bends 206 and 208illustrate deformities that may be present when the calibration targetis placed. In some cases, the bends 206 and 208 are a result ofdimpling, puckering, or an otherwise non-smooth placement of thecheckerboard target onto a platform. The bends 206 and 208 result innon-planar regions that are not expected in a typical calibrationscenario. If these deformities are present when using the conventionalcalibration that requires prior knowledge of the size and location ofthe calibration target, conventional calibration may fail or provideerroneous results.

Thus, to implement conventional calibration the calibration target mustbe accurately and precisely printed with extremely tight tolerances tosatisfy a calibration algorithm that requires these exact measurements.The present techniques overcome the need for a precise and accuratecalibration target by using depth data to reconstruct the calibrationtarget. In this manner, the exact dimensions of the calibration targetas discovered, thereby eliminating the need for an accurately andprecisely printed calibration target.

Since the calibration target does not need to be known in advance, thecalibration according to the present techniques can be implemented inthe field because the calibration target does not need to be known inadvance. As discussed above, the calibration process described hereinutilizes available depth data to calculate an accurate and precise threedimensional geometry of the actual physical structure of the calibrationtarget. In the manner, accurate and precise target creation iseliminated. Moreover, the present techniques also reduces the setupburden of the operator.

Depth data, as used herein, may refer to representations of depthinformation such as a depth field, a point cloud, a depth map, or athree dimensional (three dimensional) polygonal mesh that may be used toindicate the depth of three dimensional objects within the image. Whilethe techniques are described herein using a depth data, any depthrepresentation can be used. In some cases, depth data used toreconstruct the calibration target can be noisy. To overcome noisy depthdata, the present techniques employ smoothing techniques. Inembodiments, smoothing may be performed via averaging as describedherein as well as sub-pixel refinement.

FIG. 3 is an illustration of a plurality of views 300 of a calibrationtarget. The camera center is located at reference number 302. Theplurality of views includes a view 304, a view 308, and a view 310. Forease of description, three views are illustrated. However, the presenttechniques can be implemented with any number of views.

The camera may be placed in a single frame of reference with regard toeach view. In embodiments, the single frame of reference, as describedherein, indicates a frame reference relative to the camera.Specifically, the depth camera center may be located at x, y, z=(0,0,0)for all three dimensional views 304, 306, and 308 of the target. Eachview may be captured by repositioning the calibration target or thecamera. Once a number of views are captured, the multiple views may bealigned onto a reference location.

Put another way, as a result of the depth data obtained from the depthcamera, for each capture of the calibration target, the threedimensional location (X,Y,Z) relative to the camera can be derived. Thecamera may be positioned at coordinate x, y, z=(0,0,0) in space, and thethree dimensional coordinates of all views can be derived using thedepth data from each view. In other words, the camera acts as the singleframe of reference for all views. In embodiments, the present techniquesmay use multiple reference locations. For example, the corners of eachview may be used to define reference locations for each view.

After the multiple views have been aligned onto a reference location,the views may be averaged to create an average, less noisy, “model.”Thus, in each of views 304, 306, and 308, there is a common point thatprovides the same view in each of the views. Features of the calibrationtarget may be found in each view, and then a transform is found for eachview. A close formula may be used to estimate a rigid transform (camerapose) between two locations of the same object in space. This techniquemay be used when noise (or measurement error) is minimal. Inembodiments, a Best Rigid Transform is used to obtain a transform foreach view. Alternatively, the depth data from the multiple views can beaveraged by tracking the camera pose as each view is captured. Thecamera pose, and how each camera pose relates to the other trackedcamera poses can be used to average the views into a single model. Inembodiments, when the depth data is noisy, a Kinect Fusion technique maybe used to detect a rigid transform for each view.

With conventional calibration techniques, averaging is not possiblebecause there is no unique reference frame. In essence, there is noequivalent averaging process for two dimensional measurements as this isthe calibration process itself. Put another way, when using a single setof measurements in a single dimension, any attempt to find the relationbetween two dimensional images of the target results in a threedimensional to two dimensional projection transformation, which requiresknowledge of camera intrinsic parameters.

FIG. 4A is a process flow diagram of a method 400 that enables cameracalibration using depth data. At block 402, stereoscopic image pairs arecaptured. The stereoscopic image pairs may be captured for a pluralityof views. The depth data may be captured as a stereoscopic infrared (IR)image pair. An RGB image may also be captured for each view. Inembodiments, an image set includes an RGB image and a depth image. Inexamples, features may be detected from an image set including RGB andIR images and matched. In this example, the depth and resulting 3Dcoordinate may be computed for each matched pair. In another example,the image set can include structured light/coded light, where the inputis a depth map and a single RGB/IR image. Features may be detected onIR/RGB image, the corresponding 3D coordinate is then computed fromdepth map.

At block 404, features are extracted from each image. Correspondence foreach image set may be found for each feature in the RGB image and thedepth image. As used herein, features may be points of interest on acalibration target. For example, with a checkerboard, the intersectionof each square with another square may be a feature. Features may alsoinclude edges, lines, corners, and the like.

Thus, for each real world coordinate, there are two image coordinates(one derived from the depth image pair and one in the RGB image) thatcorrespond to the real world coordinate. At block 406, three dimensionaldata is generated using the image pairs. In embodiments, a depth moduleis used to generate three dimensional data from the captured depthimage. Stereo image pairs may be used to explicitly reconstruct perfeature three dimensional data.

In embodiments, subpixel refinement is performed for each detectedfeature. Subpixel refinement may act as a function to compute a moreaccurate image location of the detected feature. By observing that eachvector from the center the current pixel to a second point locatedwithin a neighborhood of the pixel is orthogonal to the image gradientat subject to image and measurement noise. A sum of the gradients in theneighborhood of the pixel results in a new, more accurate pixellocation. This process can be performed iteratively until the coordinatelocation stays within a particular threshold.

At block 408, the three dimensional data is refined by averaging allviews. Averaging may be used to reduce the noise that may occur in eachindividual view, where each view includes a depth image. Because eachcapture is within a single frame of reference, three dimensionalcorrespondences between the calibration targets can be found and theresults can be averaged. In embodiments, the average may be found byfinding a best fit rigid transform for each view to a reference view.The transform may be the best rotation and/or translation that willalign the points in the current view to the reference view. Inembodiments, ‘best’ is in terms of least square errors. This transformmay also be called the Euclidean or Rigid transform since the shape andsize of the view is preserved. For each view, a transform is computedsuch that

V′=Tv(V)

where Tv is the best fit rigid transform, V is the current view, and V′is very close to the reference view. Each transformed view V′ is thenaveraged, which results in a calibration target model (TM). Now that acalibration target model TM has been found, each view can be replacedwith a refined view:

Vrefined=Tv ⁻¹(TM)

At block 410, the best three dimensional to two dimensional projectionis computed based on the refined three dimensional data and thecorresponding two dimensional RGB features. The refined threedimensional data and corresponding two dimensional RGB features arefound via the calibration target model. Accordingly, the projection iscomputed using a function that estimates the intrinsic camera parametersand extrinsic parameters for each of the views.

In embodiments, the extrinsic parameters may be identical for all viewsas a result of the single frame of reference. These extrinsic parametersdescribe the calibration between the RGB and depth sensors. Once themodel is developed, the three dimensional data from multiple viewresults in a linear transformation. Additionally, in embodiments, thecalibration results in the capture of eleven parameters that define therelationship between cameras of the device. The eleven parameters arethe intrinsic parameters that define the relation of the RGB camera tothe world. Additionally, there are six extrinsic parameters that definethe relationship between the RGB and depth sensors (rotation andtranslation).

FIG. 4B is a process flow diagram of a method 400B that enables cameracalibration using depth data. In FIG. 4B depth data and RGB data areobtained and processed to generate a calibration target model. At block420, the system is positioned to view the calibration target on allsystems. In embodiments, the system may be an electronic device (FIG.1). At block 422, images are captured using the RGB sensor and the depthsensor. These images may be represented as RGB[i] and Depth[i], where iis the number of views to be obtained. At block 424, features aredetected on each of RGB[i] and Depth[i], where the features arerepresented as RGBfeat[i] and Depthfeat[i]. Each image RGB[i] andDepth[i] may contain a plurality of features.

At block 426, the features found on RGB[i] are matched to features onDepth[i], and vice versa. The matched features may be represented byMatch[i]. At block 428, a location of a three dimensional featureDepthfeat[i], is computed relative to the depth camera. The location ofthe three dimensional feature may be represented by 3Dfeat[i]. At block430, the match is applied. In embodiments, the 3Dfeat[i] and Match[i] asused to derive a two dimensional coordinate in the RGB[i] and acorresponding three dimensional coordinate in the Depth[i].

At block 432, it is determined if i is less than a threshold. If i isless than a threshold, process flow returns to block 420 where thesystem is positioned to capture another view. If i is greater than athreshold, process flow continues to block 434 sub-pixel refinement isperformed. At block 434, each feature in 3Dfeat[i] is processed viasub-pixel refinement. The result of the sub-pixel refinement is anrefinement of the three dimensional features across views to obtain arefined pixel location for each feature. At block 436, a refined featurematch RefinedFeatMatch is found by using Refined3Dfeat[i] and 3DMatch[i]to derive a two dimensional coordinate in the RGB[i] and a correspondingthree dimensional coordinate in the Depth[i].

At block 438, calibration data is obtained by computing the best threedimensional to two dimensional projection based on the refined threedimensional data Refined3DFeat. In embodiments, an Open Source ComputerVision (OpenCV) calibrate function may be used to obtain the calibrationdata. OpenCV may refer to the OpenCV Specification, released Dec. 21,2015. At block 440, the calibration data may be stored and applied todata captured by the RGB and depth sensors.

FIG. 5 is a line graph 500 illustrating error with traditionalcalibration and the calibration techniques described herein. The linegraph 500 includes an x-axis 502 representing the number of processimages or views. The line graph 500 also includes a y-axis 504representing errors calibration errors in pixels. The traditionalcalibration is represented by the line 506, and the averaged calibrationis represented by the line 508. As illustrated, the number ofcalibration errors is lower for the averaged calibration when comparedwith the traditional calibration, regardless of the number of processviews or images.

As illustrated, the present techniques result in a better absolutecalibration error value (approximately 1.6 pixel error vs. approximately5.4 pixels with the traditional method). In addition, convergenceassociated with the present techniques is are faster, where three imagesare sufficient to reduce calibration errors, and two images is theabsolute minimum required for mathematical correctness.

The present techniques have better tolerance to operator errors and alsoemploy noise reduction techniques that cannot be employed by traditionalcalibration methods, neither for 2D nor 3D targets. Moreover, thepresent techniques have a better resilience to noise, as there are lessunknown parameters to compute since the depth camera frame of referenceis used for all scenes.

FIG. 6 is a block diagram showing media 600 that contains logic forcamera calibration using depth data. The media 600 may be acomputer-readable medium, including a non-transitory medium that storescode that can be accessed by a processor 602 over a computer bus 604.For example, the computer-readable media 600 can be volatile ornon-volatile data storage device. The media 600 can also be a logicunit, such as an Application Specific Integrated Circuit (ASIC), a FieldProgrammable Gate Array (FPGA), or an arrangement of logic gatesimplemented in one or more integrated circuits, for example.

The media 600 may include modules 606-614 configured to perform thetechniques described herein. For example, an image capture module 606may be configured to capture images using the depth camera and the RGBcamera.

A correspondence module 608 may be configured to extract features andfind the correspondence between features. A depth module 610 may beconfigured generate depth data for each feature based on thestereoscopic data captured for each feature point. At block 612, anaveraging module averages depth data from all views. At block 614, acalibration module completes calibration by obtaining a threedimensional to two dimensional best fit projection. In some embodiments,the modules 607-614 may be modules of computer code configured to directthe operations of the processor 602.

The block diagram of FIG. 6 is not intended to indicate that the media600 is to include all of the components shown in FIG. 6. Further, themedia 600 may include any number of additional components not shown inFIG. 6, depending on the details of the specific implementation.

Example 1 is an apparatus for calibrating multiple cameras. Theapparatus includes an image capture module to capture depth data andsensor data for a plurality of views; an extraction module to extract afirst plurality of features from the depth data and a second pluralityof features from the sensor data for each view; a correspondence moduleto locate corresponding features in the first plurality of features andthe second plurality of features for each view; a depth module togenerate three-dimensional data for each feature of the first pluralityof features for each view; and a calibration module to calibrate themultiple cameras by matching the generated three dimensional data withthe corresponding features in the first plurality of features and thesecond plurality of features.

Example 2 includes the apparatus of example 1, including or excludingoptional features. In this example, a controller is to average thethree-dimensional data across multiple views, and the calibration moduleis to calibrate the multiple cameras by matching the averaged threedimensional data with the corresponding features in the first pluralityof features and the second plurality of features.

Example 3 includes the apparatus of any one of examples 1 to 2,including or excluding optional features. In this example, the apparatusincludes averaging the three dimensional data across multiple views bycalculating a best fit rigid transform from each view of a plurality ofviews to a reference view.

Example 4 includes the apparatus of any one of examples 1 to 3,including or excluding optional features. In this example, the apparatusincludes averaging the three dimensional data by tracking a camera poseas each view of the plurality of views is captured and using the camerapose to average the views into a single view.

Example 5 includes the apparatus of any one of examples 1 to 4,including or excluding optional features. In this example, the depthdata is obtained from a stereoscopic infrared image pair.

Example 6 includes the apparatus of any one of examples 1 to 5,including or excluding optional features. In this example, the apparatusincludes applying subpixel refinement to each feature of the pluralityof features.

Example 7 includes the apparatus of any one of examples 1 to 6,including or excluding optional features. In this example, features aredetected by observing points of interest in each view of the pluralityof views.

Example 8 includes the apparatus of any one of examples 1 to 7,including or excluding optional features. In this example, calibratingthe multiple views results in a calibration target model used totransformation between world and image coordinates.

Example 9 includes the apparatus of any one of examples 1 to 8,including or excluding optional features. In this example, the pluralityof views include multiple views of a calibration target. Optionally,calibration is performed without prior knowledge of the calibrationtarget.

Example 10 is a method for calibrating multiple cameras. The methodincludes capturing depth data and sensor data for a plurality of views;extracting a plurality of corresponding features from the depth data andthe sensor data; generating three-dimensional data for each feature ofthe plurality of corresponding features; and calibrating the multiplecameras by calculating a projection based on the three-dimensional dataand the plurality of corresponding features.

Example 11 includes the method of example 10, including or excludingoptional features. In this example, the three-dimensional data isaveraged across the multiple views, and the multiple cameras arecalibrated by calculating a projection based on the averagedthree-dimensional data and the plurality of corresponding featuresacross multiple views.

Example 12 includes the method of any one of examples 10 to 11,including or excluding optional features. In this example, the methodincludes averaging the three dimensional data across multiple views bycalculating a best fit rigid transform from each view of a plurality ofviews to a reference view.

Example 13 includes the method of any one of examples 10 to 12,including or excluding optional features. In this example, the methodincludes averaging the three dimensional data by tracking a camera poseas each view of the plurality of views is captured and using the camerapose to average the views into a single view.

Example 14 includes the method of any one of examples 10 to 13,including or excluding optional features. In this example, the depthdata is obtained from a stereoscopic infrared image pair.

Example 15 includes the method of any one of examples 10 to 14,including or excluding optional features. In this example, the sensordata is an RGB sensor data.

Example 16 includes the method of any one of examples 10 to 15,including or excluding optional features. In this example, the depthdata is obtained from structured light.

Example 17 includes the method of any one of examples 10 to 16,including or excluding optional features. In this example, the pluralityof views include multiple views of a calibration target.

Example 18 includes the method of any one of examples 10 to 17,including or excluding optional features. In this example, calibrationis performed without prior knowledge of a calibration target.

Example 19 includes the method of any one of examples 10 to 18,including or excluding optional features. In this example, averaging thethree dimensional data and applying a subpixel refinement to theplurality of features results in a smoothing of the depth data.

Example 20 is a system for calibrating multiple cameras. The systemincludes a depth camera and a sensor to capture images; a memoryconfigured to receive image data; and a processor coupled to the memory,depth camera, and sensor, the processor to: capture depth data andsensor data for a plurality of views; extract a plurality ofcorresponding features from the depth data and the sensor data; generatethree-dimensional data for each feature of the plurality ofcorresponding features; and calibrate the multiple cameras bycalculating a projection based on the three-dimensional data and theplurality of corresponding features.

Example 21 includes the system of example 20, including or excludingoptional features. In this example, the processor is to average thethree-dimensional data for each view, and the calibration module is tocalibrate the multiple cameras by matching the averaged threedimensional data with the corresponding features in the first pluralityof features and the second plurality of features.

Example 22 includes the system of any one of examples 20 to 21,including or excluding optional features. In this example, the systemincludes averaging the three dimensional data across multiple views bycalculating a best fit rigid transform from each view of a plurality ofviews to a reference view.

Example 23 includes the system of any one of examples 20 to 22,including or excluding optional features. In this example, the systemincludes averaging the three dimensional data by tracking a camera poseas each view of the plurality of views is captured and using the camerapose to average the views into a single view.

Example 24 includes the system of any one of examples 20 to 23,including or excluding optional features. In this example, the depthdata is obtained from a stereoscopic infrared image pair.

Example 25 includes the system of any one of examples 20 to 24,including or excluding optional features. In this example, the systemincludes applying subpixel refinement to each feature of the pluralityof features.

Example 26 includes the system of any one of examples 20 to 25,including or excluding optional features. In this example, features aredetected by observing points of interest in each view of the pluralityof views.

Example 27 includes the system of any one of examples 20 to 26,including or excluding optional features. In this example, calibratingthe multiple views results in a calibration target model used totransformation between world and image coordinates.

Example 28 includes the system of any one of examples 20 to 27,including or excluding optional features. In this example, the pluralityof views include multiple views of a calibration target. Optionally,calibration is performed without prior knowledge of the calibrationtarget.

Example 29 is an apparatus for calibrating multiple cameras. Theapparatus includes an image capture module to capture depth data andsensor data for a plurality of views; a means to extract a firstplurality of features from the depth data and a second plurality offeatures from the sensor data for each view; a means to locatecorresponding features in the first plurality of features and the secondplurality of features for each view; a means to generatethree-dimensional data for each feature of the first plurality offeatures for each view; and a means to calibrate the multiple cameras bymatching the generated three dimensional data with the correspondingfeatures in the first plurality of features and the second plurality offeatures.

Example 30 includes the apparatus of example 29, including or excludingoptional features. In this example, a means to average views is toaverage the three-dimensional data for each view, and the means tocalibrate the multiple cameras is to match the averaged threedimensional data with the corresponding features in the first pluralityof features and the second plurality of features.

Example 31 includes the apparatus of any one of examples 29 to 30,including or excluding optional features. In this example, the apparatusincludes averaging the three dimensional data across multiple views bycalculating a best fit rigid transform from each view of a plurality ofviews to a reference view.

Example 32 includes the apparatus of any one of examples 29 to 31,including or excluding optional features. In this example, the apparatusincludes averaging the three dimensional data by tracking a camera poseas each view of the plurality of views is captured and using the camerapose to average the views into a single view.

Example 33 includes the apparatus of any one of examples 29 to 32,including or excluding optional features. In this example, the depthdata is obtained from a stereoscopic infrared image pair.

Example 34 includes the apparatus of any one of examples 29 to 33,including or excluding optional features. In this example, the apparatusincludes applying subpixel refinement to each feature of the pluralityof features.

Example 35 includes the apparatus of any one of examples 29 to 34,including or excluding optional features. In this example, features aredetected by observing points of interest in each view of the pluralityof views.

Example 36 includes the apparatus of any one of examples 29 to 35,including or excluding optional features. In this example, calibratingthe multiple views results in a calibration target model used totransformation between world and image coordinates.

Example 37 includes the apparatus of any one of examples 29 to 36,including or excluding optional features. In this example, the pluralityof views include multiple views of a calibration target. Optionally,calibration is performed without prior knowledge of the calibrationtarget.

Example 38 is at least one machine readable medium comprising aplurality of instructions that. The computer-readable medium includesinstructions that direct the processor to capture depth data and sensordata for a plurality of views; extract a plurality of correspondingfeatures from the depth data and the sensor data; generatethree-dimensional data for each feature of the plurality ofcorresponding features; and calibrate the multiple cameras bycalculating a projection based on the three-dimensional data and theplurality of corresponding features.

Example 39 includes the computer-readable medium of example 38,including or excluding optional features. In this example, thethree-dimensional data is averaged across the multiple views, and themultiple cameras are calibrated by calculating a projection based on theaveraged three-dimensional data and the plurality of correspondingfeatures across multiple views.

Example 40 includes the computer-readable medium of any one of examples38 to 39, including or excluding optional features. In this example, thecomputer-readable medium includes averaging the three dimensional dataacross multiple views by calculating a best fit rigid transform fromeach view of a plurality of views to a reference view.

Example 41 includes the computer-readable medium of any one of examples38 to 40, including or excluding optional features. In this example, thecomputer-readable medium includes averaging the three dimensional databy tracking a camera pose as each view of the plurality of views iscaptured and using the camera pose to average the views into a singleview.

Example 42 includes the computer-readable medium of any one of examples38 to 41, including or excluding optional features. In this example, thedepth data is obtained from a stereoscopic infrared image pair.

Example 43 includes the computer-readable medium of any one of examples38 to 42, including or excluding optional features. In this example, thesensor data is an RGB sensor data.

Example 44 includes the computer-readable medium of any one of examples38 to 43, including or excluding optional features. In this example, thedepth data is obtained from structured light.

Example 45 includes the computer-readable medium of any one of examples38 to 44, including or excluding optional features. In this example, theplurality of views include multiple views of a calibration target.

Example 46 includes the computer-readable medium of any one of examples38 to 45, including or excluding optional features. In this example,calibration is performed without prior knowledge of a calibrationtarget.

Example 47 includes the computer-readable medium of any one of examples38 to 46, including or excluding optional features. In this example,averaging the three dimensional data and applying a subpixel refinementto the plurality of features results in a smoothing of the depth data.

It is to be understood that specifics in the aforementioned examples maybe used anywhere in one or more embodiments. For instance, all optionalfeatures of the computing device described above may also be implementedwith respect to either of the methods or the computer-readable mediumdescribed herein. Furthermore, although flow diagrams and/or statediagrams may have been used herein to describe embodiments, thetechniques are not limited to those diagrams or to correspondingdescriptions herein. For example, flow need not move through eachillustrated box or state or in exactly the same order as illustrated anddescribed herein.

The present techniques are not restricted to the particular detailslisted herein. Indeed, those skilled in the art having the benefit ofthis disclosure will appreciate that many other variations from theforegoing description and drawings may be made within the scope of thepresent techniques. Accordingly, it is the following claims includingany amendments thereto that define the scope of the present techniques.

What is claimed is:
 1. An apparatus for calibrating multiple cameras,comprising: an image capture module to capture depth data and sensordata for a plurality of views; an extraction module to extract a firstplurality of features from the depth data and a second plurality offeatures from the sensor data for each view; a correspondence module tolocate corresponding features in the first plurality of features and thesecond plurality of features for each view; a depth module to generatethree-dimensional data for each feature of the first plurality offeatures for each view; and a calibration module to calibrate themultiple cameras by matching the generated three dimensional data withthe corresponding features in the first plurality of features and thesecond plurality of features.
 2. The apparatus of claim 1, wherein acontroller is to average the three-dimensional data across multipleviews, and the calibration module is to calibrate the multiple camerasby matching the averaged three dimensional data with the correspondingfeatures in the first plurality of features and the second plurality offeatures.
 3. The apparatus of claim 1, comprising averaging the threedimensional data across multiple views by calculating a best fit rigidtransform from each view of a plurality of views to a reference view. 4.The apparatus of claim 1, comprising averaging the three dimensionaldata by tracking a camera pose as each view of the plurality of views iscaptured and using the camera pose to average the views into a singleview.
 5. The apparatus of claim 1, wherein the depth data is obtainedfrom a stereoscopic infrared image pair.
 6. The apparatus of claim 1,comprising applying subpixel refinement to each feature of the pluralityof features.
 7. The apparatus of claim 1, wherein features are detectedby observing points of interest in each view of the plurality of views.8. The apparatus of claim 1, wherein calibrating the multiple viewsresults in a calibration target model used to transformation betweenworld and image coordinates.
 9. The apparatus of claim 1, wherein theplurality of views include multiple views of a calibration target. 10.The apparatus of claim 9, wherein calibration is performed without priorknowledge of the calibration target.
 11. A method for calibratingmultiple cameras, comprising: capturing depth data and sensor data for aplurality of views; extracting a plurality of corresponding featuresfrom the depth data and the sensor data; generating three-dimensionaldata for each feature of the plurality of corresponding features; andcalibrating the multiple cameras by calculating a projection based onthe three-dimensional data and the plurality of corresponding features.12. The method of claim 11, wherein the three-dimensional data isaveraged across the multiple views, and the multiple cameras arecalibrated by calculating a projection based on the averagedthree-dimensional data and the plurality of corresponding featuresacross multiple views.
 13. The method of claim 11, comprising averagingthe three dimensional data across multiple views by calculating a bestfit rigid transform from each view of a plurality of views to areference view.
 14. The method of claim 11, comprising averaging thethree dimensional data by tracking a camera pose as each view of theplurality of views is captured and using the camera pose to average theviews into a single view.
 15. The method of claim 11, wherein the depthdata is obtained from a stereoscopic infrared image pair.
 16. A systemfor calibrating multiple cameras, comprising: a depth camera and asensor to capture images; a memory configured to receive image data; anda processor coupled to the memory, depth camera, and sensor, theprocessor to: capture depth data and sensor data for a plurality ofviews; extract a plurality of corresponding features from the depth dataand the sensor data; generate three-dimensional data for each feature ofthe plurality of corresponding features; and calibrate the multiplecameras by calculating a projection based on the three-dimensional dataand the plurality of corresponding features.
 17. The system of claim 16,wherein the processor is to average the three-dimensional data for eachview, and the calibration module is to calibrate the multiple cameras bymatching the averaged three dimensional data with the correspondingfeatures in the first plurality of features and the second plurality offeatures.
 18. The system of claim 16, comprising averaging the threedimensional data across multiple views by calculating a best fit rigidtransform from each view of a plurality of views to a reference view.19. The system of claim 16, comprising averaging the three dimensionaldata by tracking a camera pose as each view of the plurality of views iscaptured and using the camera pose to average the views into a singleview.
 20. The system of claim 16, wherein the depth data is obtainedfrom a stereoscopic infrared image pair.
 21. The system of claim 16,comprising applying subpixel refinement to each feature of the pluralityof features.
 22. At least one machine readable medium comprising aplurality of instructions that, in response to being executed on acomputing device, cause the computing device to: capture depth data andsensor data for a plurality of views; extract a plurality ofcorresponding features from the depth data and the sensor data; generatethree-dimensional data for each feature of the plurality ofcorresponding features; and calibrate the multiple cameras bycalculating a projection based on the three-dimensional data and theplurality of corresponding features.
 23. The computer readable medium ofclaim 22, wherein the three-dimensional data is averaged across themultiple views, and the multiple cameras are calibrated by calculating aprojection based on the averaged three-dimensional data and theplurality of corresponding features across multiple views.
 24. Thecomputer readable medium of claim 22, comprising averaging the threedimensional data across multiple views by calculating a best fit rigidtransform from each view of a plurality of views to a reference view.25. The computer readable medium of claim 22, comprising averaging thethree dimensional data by tracking a camera pose as each view of theplurality of views is captured and using the camera pose to average theviews into a single view.